Goto

Collaborating Authors

 numerical rating


Judging It, Washing It: Scoring and Greenwashing Corporate Climate Disclosures using Large Language Models

Chuang, Marianne, Chuang, Gabriel, Chuang, Cheryl, Chuang, John

arXiv.org Artificial Intelligence

We study the use of large language models (LLMs) to both evaluate and greenwash corporate climate disclosures. First, we investigate the use of the LLM-as-a-Judge (LLMJ) methodology for scoring company-submitted reports on emissions reduction targets and progress. Second, we probe the behavior of an LLM when it is prompted to greenwash a response subject to accuracy and length constraints. Finally, we test the robustness of the LLMJ methodology against responses that may be greenwashed using an LLM. We find that two LLMJ scoring systems, numerical rating and pairwise comparison, are effective in distinguishing high-performing companies from others, with the pairwise comparison system showing greater robustness against LLM-greenwashed responses.


Enhancing Collaborative Filtering Recommender with Prompt-Based Sentiment Analysis

Dang, Elliot, Hu, Zheyuan, Li, Tong

arXiv.org Artificial Intelligence

Collaborative Filtering(CF) recommender is a crucial application in the online market and ecommerce. However, CF recommender has been proven to suffer from persistent problems related to sparsity of the user rating that will further lead to a cold-start issue. Existing methods address the data sparsity issue by applying token-level sentiment analysis that translate text review into sentiment scores as a complement of the user rating. In this paper, we attempt to optimize the sentiment analysis with advanced NLP models including BERT and RoBERTa, and experiment on whether the CF recommender has been further enhanced. We build the recommenders on the Amazon US Reviews dataset, and tune the pretrained BERT and RoBERTa with the traditional fine-tuned paradigm as well as the new prompt-based learning paradigm. Experimental result shows that the recommender enhanced with the sentiment ratings predicted by the fine-tuned RoBERTa has the best performance, and achieved 30.7% overall gain by comparing MAP, NDCG and precision at K to the baseline recommender. Prompt-based learning paradigm, although superior to traditional fine-tune paradigm in pure sentiment analysis, fail to further improve the CF recommender.


Some Ethical Issues in the Review Process of Machine Learning Conferences

Russo, Alessio

arXiv.org Machine Learning

Recent successes in the Machine Learning community have led to a steep increase in the number of papers submitted to conferences. This increase made more prominent some of the issues that affect the current review process used by these conferences. The review process has several issues that may undermine the nature of scientific research, which is of being fully objective, apolitical, unbiased and free of misconduct (such as plagiarism, cheating, improper influence, and other improprieties). In this work, we study the problem of reviewers' recruitment, infringements of the double-blind process, fraudulent behaviors, biases in numerical ratings, and the appendix phenomenon (i.e., the fact that it is becoming more common to publish results in the appendix section of a paper). For each of these problems, we provide a short description and possible solutions. The goal of this work is to raise awareness in the Machine Learning community regarding these issues.

  Country:
  Genre: Research Report (0.64)
  Industry: Law Enforcement & Public Safety > Fraud (0.35)

Sentiment Analysis based Multi-person Multi-criteria Decision Making Methodology: Using Natural Language Processing and Deep Learning for Decision Aid

Zuheros, Cristina, Martínez-Cámara, Eugenio, Herrera-Viedma, Enrique, Herrera, Francisco

arXiv.org Artificial Intelligence

Over time, different models have emerged to help us to solve DM problems. In particular, multi-person multi-criteria decision making (MpMcDM) models consider the evaluations of multiple experts to solve a decision situation analyzing all possible solution alternatives according to several criteria [45]. Computational DM process, as the human DM one, requires of useful, complete and insightful information for making the most adequate decision according to the input information. The input of DM models is usually a set of evaluations from the experts. They wish to express their evaluations in natural language, but raw text is not directly processed by DM models. Accordingly, several approaches are followed for asking and elaborating a computational representation of the evaluations, namely: (1) using a numerical representation of the evaluations [35] and (2) using a predefined set of linguistic terms [13]. These approaches for asking evaluations constrain the evaluative expressiveness of the experts, because they have to adapt their evaluation to the numerical or linguistic evaluation alternatives. We claim that experts in a DM problem have to express their evaluations in natural language, and the DM model has to be able to process and computationally represent them. Natural language processing (NLP) is the artificial intelligence area that combines linguistic and computational language backgrounds for understanding and generating human language [16, 28].


Transfer Learning to Predict Missing Ratings Via Heterogeneous User Feedbacks

Pan, Weike (Hong Kong University of Science and Technology) | Liu, Nathan N. (Hong Kong University of Science and Technology) | Xiang, Evan W. (Hong Kong University of Science and Technology) | Yang, Qiang (Hong Kong University of Science and Technology)

AAAI Conferences

Data sparsity due to missing ratings is a major challenge for collaborative filtering (CF) techniques in recommender systems. This is especially true for CF domains where the ratings are expressed numerically. We observe that, while we may lack the information in numerical ratings, we may have more data in the form of binary ratings.  This is especially true when users can easily express themselves with their likes and dislikes for certain items.  In this paper, we explore how to use the binary preference data expressed in the form of like/dislike to help reduce the impact of data sparsity of more expressive numerical ratings.  We do this by transferring the rating knowledge from some auxiliary data source in binary form (that is, likes or dislikes), to a target numerical rating matrix. Our solution is to model both numerical ratings and like/dislike in a principled way, using a novel framework of Transfer by Collective Factorization (TCF). In particular, we construct the shared latent space collectively and learn the data-dependent effect separately. A major advantage of the TCF approach over previous collective matrix factorization (or bi-factorization) methods is that we are able to capture the data-dependent effect when sharing the data-independent knowledge, so as to increase the overall quality of knowledge transfer. Experimental results demonstrate the effectiveness of TCF at various sparsity levels as compared to several state-of-the-art methods.